Define a train/test threshold:

ex: 80% train - 20% test


In [6]:
split = .8

Specify your input file:


In [7]:
file_origin = "/path/to/your/input/dataset.csv"

Specify your ouput files:


In [ ]:
file_train = "/path/to/your/output/train_dataset.csv"
file_test = "/path/to/your/output/test_dataset.csv"

Execute the following python code:


In [3]:
from random import random

In [8]:
with open(file_train, 'w') as train,\
     open(file_test, 'w') as test,\
     open(file_origin) as origin:
    for line in origin:
        rand = random()
        if rand < split:
            train.write(line)
        else:
            test.write(line)